Skip to main content

File uploads

Note that the File Uploads tab will only be shown if the task is defined with an endpoint that supports this feature.

Click the Optimize File Uploads button to improve performance when replicating to file-based targets such as Amazon S3 and Hadoop. When this feature is enabled, the button text changes to Disable File Upload Optimization. Click the Disable File Upload Optimization button to disable file upload optimization.

The upload mode depends on the task type:

  • Full Load - Multiple files created from the same table are transferred in parallel, in no particular order.
  • Apply Changes - Files created from multiple tables are transferred in parallel. Files created from the same table are transferred sequentially according to creation time.
  • Change Data Partitioning - Files created from multiple tables and files created from the same table are transferred in parallel.

Note that disabling this option after the task has already started will require you to do one of the following:

  • If the task is in the Full Load stage, reload the target using the Reload Target Run option.
  • If the task is in the Change Processing stage, resume the task using the Start processing changes from Run option.
Information note
  • Supported by the following target endpoints only: Amazon S3, Hadoop (Hortonworks and Cloudera) Microsoft Azure ADLS, Databricks (Cloud Storage), Microsoft Azure HDInsight, Hortonworks Data Platform (HDP), Google Cloud Storage, Google Cloud Dataproc, Amazon EMR, and Cloudera Data Platform (CDP) Private Cloud.

  • General Limitations and Considerations:
    • Post Upload Processing endpoint settings are not supported.

  • Hadoop - Limitations and Considerations:
    • When replicating to a Hadoop target, only Text and Sequence file formats are supported.
    • Hive jobs are not supported as they will prevent the file upload.
    • Append is not supported when using Text file format.
  • Amazon S3 and Microsoft Azure ADLS - Limitations and Considerations:
    • When working with Reference Files, a new entry is added to the Reference File immediately after the data file is uploaded (even if the DFM file has not been uploaded yet).
    • The existence of the DFM file does not necessarily mean that the associated data file has also been uploaded.

 

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!